Editor's note: This article appeared during March Madness in 2016. The teams and the stakeholders may have changed, but statistics hold true.
March Madness is upon us as experts and amateurs alike make their selections for NCAA basketball tournament winners — and increasingly, some are using Big Data to help them.
Participants competing for a cash prize or simply a year’s worth of bragging rights have until just before the first matchup (Duke v. UNCW), around noon today, to submit their brackets.
Some make their picks using their gut. Other’s choose winning teams based on their mascot. And then there are those who choose their alma mater to take it all even though statisticians gave them less than 1% chance of winning the tournament. (Hint: That was the author, last year).
But, just like forecasting other complex events, computer scientists can use analytics to try and create the ideal bracket. But an old saying holds true with such efforts: Garbage in, garbage out.
"Data science is not much use if you don’t have good data to begin with," said V.S. Subrahmanian, a computer science professor at the University of Maryland. Subrahmanian has never participated in March Madness bracketology, but he is an expert in Big Data analytics. He has used data science to successfully predict where insurgents will plant improvised explosive devices in Iraq and Afghanistan for the Department of Defense and has also used predictive analytics to help curb the poaching of rhinos and elephants in Africa.
According to Subrahmanian, a successful use of data analytics for the tournament will involve gathering data for every team, player and potential reserve in the tournament. For each of the individuals on the court, to name a few, the successful analyst needs to take into account factors like their height, weight, speed, field goal percentage, past playing time and vertical leaping ability. The list goes on and on.
Once all the data points are collected, you have to factor in variables. A single player could bring at least a dozen variables, Subrahmanian said — such as how a coach uses a player and the on-court dynamics of a team.
On top of all this, teams in the tournament may not have played each other earlier in the season, so there isn’t a recent history of individual competitions to draw upon.
Past tournaments are littered with a history of upsets and Cinderella stories, forcing data scientists to take into account a team’s propensity to be upset and ability to perform under pressure while on national television.
The odds of having a perfect bracket are as low as about 1 in 9.2 quintillion, although there is some division among experts on this front and some believe it could be as "high" as 1 in 128 billion. That is a lot of zeros (1 quintrillion is '1' followed by 18 zeroes). As the NCAA points out, it is significantly more likely that you will find a four-leaf clover (1 in 10,000), win an Oscar (1 in 11,500) or become president (1 in 10,000,000).
Still, using advanced analytics, thorough data sets and the proper calibration of both parameters and variables, people can give themselves the best chance of winning their bracket competitions.
"Nobody can claim to predict perfectly and I doubt that is going to change over the next 10 years," Subrahmanian said. On average, "we can predict much more accurately now ... it's getting better every day."
But, "getting it all right? Even the best data science program is going to need some luck," he said.
As a consolation for outlining the factors for why it is so unlikely that someone will have a perfect bracket, provided below are some resources to assist you in your last minute bracketology endeavors:
FiveThirtyEight
Since 2011, the statistical analysis website FiveThirtyEight has issued probability forecasting for NCAA March Madness tournaments. For the first time, this year the site will provide live updates for a team’s chance of winning and advancing during a game.
Offering round by round probabilities, you can use their statistical modeling to help educate your gut instinct.
The website uses play-by-play data from the past five seasons of Division I NCAA basketball for their model, including data points like pre-game win probabilities, score difference and the time left in the game to offer live probability updates.
More than two dozen teams have a less than .1% chance of winning the tournament, according to FiveThirtyEight. Kansas and North Carolina are most likely to win the men’s tournament, with a 19.1% and 15% probability of winning, respectively. For the women? Connecticut has a 70% chance of winning the tournament while vying for its fourth straight national championship.
Bing
Microsoft’s search engine Bing also shows the game by game probability, even briefly explaining why one team could beat another. Its model also predicts some men’s tournament upsets, like the potential for No. 11 seed Gonzaga to beat No. 6 seed Seton Hall "because they play much more disciplined basketball."
In 2015, Bing’s bracket was in the top 30% of all national brackets and also beat brackets from Google, Facebook and Sports Illustrated, according to Bing’s blog.
The four No. 1 seeds were a bit up-for-grabs this year, as the combined 23 losses are the most of any group of No. 1 seeds. But, like FiveThirtyEight, Bing predicts that Kansas and North Carolina will make it to the finals, with Kansas winning it all.
Oval Office "bracketology"
If you don't want to use data analysis, perhaps you can turn to the picks of the world's most powerful political leader. President Barack Obama, whose love of basketball (and the NCAA tournament) is no secret, is hoping to leave office having accurately picked the winner during his first and last year in office. In 2009, he accurately picked North Carolina as the men's tournament winner -- but in the past six brackets, the president has not picked the winner.
Obama knows a thing or two about basketball, "demonstrating an easy knowledge of the game," according to The New Yorker. This year he sat down with ESPN to fill out his bracket and said he is more likely to pick the coaches when making his selection.
This year, as he did in 2010 and 2011, Obama chose Kansas to take the title.
"I think the Jayhawks in a squeaker get past UNC," he said.